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ABSTRACT 


The quasi-Monte Carlo method of integration offers an attractive 
solution to the problem of evaluating integrals in a large number of 
dimensions; however, the associated error bounds are difficult to 

2 

obtain theoretically. Since these bounds are associated with the L 

discrepancy of the set of points used in the integration, this paper 

2 

presents numerical calculations of the L discrepancy for several types 
of quasi-Monte Carlo formulae. 
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The quasi-Monte Carlo method of evaluation of a multiple integral 
consists of averaging function values over a well-distributed set of 
points in the region of integration. This method differs from classical 
methods in which a part of some representation (polynomial, Fourier series) 
of the function is integrated exactly, and from the ordinary Monte Carlo 
method where a "random" sample of function values is averaged. Error 
bounds for quasi-Monte Carlo integration can be based on various measures 
of the inequity in the distribution of the set of points over which the 
function is averaged. Any integration formula may be treated as quasi- 
Monte Carlo by using the bounds discussed below. 

This paper gives the results of some computational studies of the 

2 

L discrepancy of several types of sets of points which have been sug- 
gested as being suitable for quasi-Monte Carlo integration of functions 
in the unit K-cube. 

The local discrepancy, g(£)> at a point £ in the unit K-cube 

(0 < <1, i = 1, ... K) , of a set of points X = (x } , m = 1, . N, 

i — ~ ~m 

is defined by 

-1 K 

g(0 = N V(D - n 5 (i) 

i=l 

where \>(£) is the number of points of X whose coordinates satisfy: 

0 < x^ <_ £ , [1], [2]. The function g(0 is a measure of the local 

unevenness in the distribution of the points of X . Figure 1 illustrates 
g(0 in two dimensions. 
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Figure 1 

N = 8, v(.6, .5) = 3 

g(.6, .5) = 3/8-. 6 x .5 - .075 


Various norms of g(Q taken oven the unit K-cube give a global measure 
of unevenness and may be used to express error bounds for quasi-Monte 
Carlo integration, [1], [2], [3], [4]. 

The error, e ,. for quasi-Monte Carlo integration is given by 

N 11 

e = | N Z f (x ) - / ... / f(|)d? 1 ...d5 K | . (2) 

m=l ~ o o 


Although the function g(£) is not differentiable, it can be treated as 
a generalized function so that the following derivation of the error 
bounds is correct, [1], [5]. the same result has been obtained in a 
different manner by Zaremba in [3]. The proof is given below for the 
case of two dimensions but the same argument generalizes to more dimen- 
sions without any essential change. Equation (2) can be written in the 
case of two dimensions 

11 2 

£'= | 7 / f(q, y - a - s(C L’ 9 d^d^- | - - - - (3) 

o o 3 Cj.3£ 2 
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where the derivatives of g(£) are generalized derivatives, [5]. Integra- 
tion of (3) by parts gives 

n^f^.g i 3fd,5 9 ) i af(ga) 

e= | / / g (^l’^2' )d ^l d ^2 " £ g(l,? 2 ^ d ^2 “ £ ^ •' n 1 gC^.l)^! 


(4) 


since the other terms from the integration by parts vanish identically, 
[3]. Application of the triangle inequality to (4) gives 


1 I / / ^ W g(g ,£ )dg dgj+l / af ^ 1 »^2 ) g(l,g )dg 1 + 1/ 9£(g r 1} g(g.l)4€J, 

o o 11 O 35, o 3?, f 


(5) 

From the definition of g(0 , it can be seen that gCl.gj) anc * g(£^,l) 
are the local discrepancies of the points X ={ x m j.» x m 2 ^ Projected onto 
the ^2 an d axes respectively. 

Of the possible bounds obtainable from (5) , the one discussed here*. 

comes from an application of the Cauchy-Schwarz inequality to (5), [3]. 

2 2 
The L norm of g(£), denoted by T(X) , (the L or meansquare discrepancy 

of X) is defined by 

T(X) = p.../ Ig(|)] 2 d5 1 ... d S K ^ 1/2 t (6) 


Combining (6) with (5) using the Cauchy-Schwarz inequality gives the 
bound, in two dimensions. 
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e < 



r 3 2 f(S 1 ,£ 2 ) 


a5 l 35 2 


1/2 


d ^2( 


f 1 S f/c n 2 V /2 l 1 3f 2 I 

T(p+|/[ d 3^ >lj r dg 1 1 ^ T 1 (X)+)/[— ]d^2j 


1 1/2 


J 


1‘ 


J 


t 2 (x) 


(7) 

2 

In (7), T^(X) and T 2 (X) are t ' ie ^ discrepancies of the projections of 
X onto the and ^ axes. 

It is possible to obtain bounds on e in terms of the extreme 

discrepancy, D(X) = q< 5<1 > and t ^ le tota i variation in the sense of 

Hardy and Krause of f(D , [1], . [3] , [6]. Because T(X) is much easier 

to compute than D(X) , this paper will deal with T(X) only. 

In Equation (7) the partial derivatives with some = 1 may be 

replaced by those with that 5. = 0 by making the change of variable 

1 1 1 * 

s' - 1 - 5, . Since / f(S.)d5. - / f (5 '■)(!?, and 3£(g i* 3f (g P 

1 1 o 1 1 o 1 1 3?. " ~ 35/ ' 

x i 

Equation (7) is valid as only the squares of the partial derivatives are 
used. 

In order to utilize the bounds, (7),. in numerical calculations, it 

is necessary to have some idea of the behavior of T(X) as a function 

of N . A few theoretical results are available. Roth has shown that 
K-l 

-1 2 

T(X) >_ C-^ (£nN) for any set of N points X in K dimensions; the 

C K are independent of N , [9]. Sequences have been suggested for which 

_ K.— 1 

T(X) _< Bjjil (£nN) “where the ~'B~ depend- only- on- K--[ 10-]-. -One- sequence 

-1 1/2 

has been constructed in two dimensions with T(X)~ AN (£nN) with A 
a constant [2]. The theoretical behavior of T(X) for a random sequence 

is T(X) = [(1/2)^ - (1/3)^] ^ 2 N ^ 2 ; [1]. In view of the difficulty 
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of obtaining theoretical values for T(X) , this paper gives the results 
of computations of T(X) for various sequences which have been suggested 
in the literature. Also discussed are some new sequences based on com- 
putational experience. 

For purposes of computation, it is more convenient to work with 
2 2 

the expression for N T (X) , denoted by J (X) . Formulae for efficient 
computation of J (X) in terms of the coordinate values of X are given 
below, [7], [2]. By using the Heaviside function H(z) = 1 for z > 0 
and H(z) = 0 for z < 0 , g(£) can be written as 


SCO = N 


-1 


N K 
Z 


K 

n 


n h( 5 - x ) - 

m=l i=l 1 mi i-1 


( 8 ) 


Now J (X) becomes 


1 1 N N K 

j(x) -/.../[ z z n H(5 £ - 

o o m=l n=l i=l 


X ,)H(£ 
mi l 


x )-2N 
ni 


N 

Z 


K 

n 


m=l i=l 


e.H(e. 

i i 


x hi ) 


+ N 


2 


K 

n 

i=l 



] d? ± ... d5 


K ' 


(9) 


Since each factor in each term of the integrand depends upon only one 
integration variable , the integration of each factor may be carried 
out separately to yield 


J(X) 


N N 
Z Z 

m=l n=l 


II [l-max(k .»*.)]- 2 ^N 
. . mi ni 

i=l 


N 

Z 

m=l 



( 10 ) 
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For faster computation (10) may be written as 


1 N N K N K _ K+ , 

J (X) = ± 2 2 n [ 1-max (x .,x ,)]+ E IT (l- x ,)-2N K 

m=l n=l i=l m=l i=l 


N 

E 


K 


m=l x=l 


+ 3-% 2 


( 11 ) 


The time required for evaluation of J (X) using (11) is roughly 
2 

proportional to N . For a sequence such as the Halton sequence, [3], 
it is more efficient to compute J(X) for N points from that for 
N-l points. Equations (12a - 12e) outline this calculation. 

(12a) J(X) = P(N) - 2“ K+1 NQ(N) + 3 mK N 2 

^[1 - max( Xini> x N1 )3 + ^ (1 - ^ 

(12e) Q(N) = Q(N - 1) + n (1 - x?,) 

i=l 

These equations the computation of J(X) in a time roughly proportional 
to N 2 for each set of M points from M = 1, . . . N , where each successive " 
point is added to the previous set without altering the coordinates of 
the previous points. A somewhat more elaborate set of formulae (13a - 13i) 
may be used to achieve a similar savings in time for a sequence such as 


K 

d2b) p(i) = n (i - x ) 

i=l 1 


(12c) Q ( 1) - n (1 - x^) 

i=l 

N-l 

(12d) P (N) - P(N - 1) + 2 E 

m=l 

K 
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Hammers ley ' s , [3] , where the first coordinate is always equally spaced 
at intervals of l/N. 


(13a) 

J(X) = P(N) 

- N 1 P(N) 

- 2“ K+1 NQ(N) + 2 _K+1 

N 2 Q(N) + 3" K N 2 



K 



(13b) 

?(1) = P(l) 

= n (i - 
1=2 

X li> 


(13 c) 

Q(l) = Q(l) 

K 

- n (i - 

i=2 

2 x 

X li ) 



N-l 

K 

K 


(13d) 

AP = 2 Z 
*1 

n [1 - max(x tn ., XNi )] + n (1 - 



(13e) 

aq = n (i - 

i=2 

X Ni ) 

(13f ) 

P (N) - P(N - 

1) + AP 

(13g) 

P (N) = P (N - 

1) + NAP 

(13h) 

Q(N) = Q(N - 

1) + aq 

(13i) 

Q(N) = Q(N - 

1) + n 2 aq 


Figures 2 through 9 are plots of T(X) vs. N for several sequences 
in 2 through 9 dimensions. The values of T(X) are plotted for N 
running from 25 to 1000 in steps of 25, except for the good lattice 
sequences where N is not necessarily a multiple of 25 and the sequences 
which were derived by computation where all values of N were not tried. 
Figure 10 is an extension of Figure 9 to 2000 points only sequences 2, 

6, and 7 are plotted. Sequence 4 is slightly below Sequence 2 but not 
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enough to show on the graph. Figure 11 is an expanded version of 
Figure 4 showing T(X) at each N for N running from 150 to 190. 

The solid line represents T(X) for a cannonical random sequence. All 
the points are not plotted since otherwise the graphs would become rather 
crowded. 

Each sequence described below consists of sets of N points x^ 

where m = 0, ... N - 1. The first point X q is placed at (1, 1, ...) 

instead of (0, 0, ...) as is usual because computational experience 

suggests that this will give a lower value of T(X) in general. 

Sequence 1, 0 on the graphs, is the Halton sequence [3], 

defined in K dimensions by x^ = ((^(m), <j> 3 (m), ... <j> p (m)) where the 

K 

P's are the first K primes. The function <j> r (m) is the radical- inverse 

2 

function of m to the base r . If an integer m = + a^r + a^r + ..., 

where the a^ are uniquely defiend integers in {0, 1, ... r-l} for any 

-1 -2 

integer radix r 2, then <f> r (m) + a^r ... . The function is 

equivalent to taking the r-ary representation of the number m and 
reflecting the digits about the radical point. For example 10 = 1010 2 
(numbers without subscript are base ten) so <j> 2 (10) ** 0.101 2 or 5/16 ; 
and 11 = 102 3 so <^(11) - .201 3 = 19/27. 

Sequence 2, O , is the Hammers ley sequence defined by 
X = (m/N, <f> 9 (m), ... <J> (m))» [3]. 

.. . Sequence 3,.., W_. , is a modified generalization of a idea _due to 
Zaremba, [2] . The function ip^Cm) , the folded radical inverse function, 

is defined similarly to <j> (m) . If the representation of m base r 

2 -1 
is m = a Q + a^r + a 2 r + ... then ^ r (m) = (a Q + 0) mod r r 

+ (a. + 1) , r 2 + ( a + i) r * ^ + . . . . The ip differ from 

1 mod r 1 mod r r 
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the 4> r in that each coefficient has its index, i , added to modules 

r in the expansion of ip . For example ^(lO) = . OOOOO^ or 1/48 and 

1^(11) = . 210012^ or 551/702 , the [superscribed] line represents 

an infinitely repeated sequence of digits. The sequence is 

X = (\p (m) , . . . ip p (m)) which is analogous to the Halton sequence. 

~ m 1 K 

Sequence 4 , ® , is the sequence x = (m/N (m) , . . . \p p (m)) 

~ m K-l 

which is analogous to the Hammersley sequence. 


Sequence 5 , 




, is the sequence x 


m 




m(m+l) 




where {a} denotes the fractional part of C t . This 
sequence has been suggested by Haber as a pseudo-random sequence [11]. 
Sequence 6 , A , is the sequence x m = (fm/rj, ...fmJfA). 
Sequence 7 , ^ , is constructed from the Univac 1108 pseudo- 
random number generator at the University of Wisconsin. The generator 

15 

is a mixed-congruential type u^ = (5 u £_i + ^mod 2^ * [12]. The 

u^ are supposed to be uniformly distributed integers on the interval 
35 35 

(0,2 -1) so that v^ = u^/2 will be distributed uniformly on (0,1). 

The sequence is x = (v(m-l)K+l, v(m-l) K+2, . . .v(m-K)) . The 
~m 


initial u = 5 
o 
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The sets of points in Figures 2, 3, and 4 denoted by z are the 
good lattice points of Zaremba. The two dimensional sequence is de- 


scribed in [13] and is the sequence x 


m 


( 


,_m £mF£_i 

’ t h 


) where 


'A 


is the £th Fibonacci number. The lattices used in three and four 


dimensions were obtained computationally [14]. 

The sets of points in Figures 2 and 3 denoted by + were computed 
from Sequence 4 by itteratively applying the following formula: 
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x 

pq 


[2- k+2 x n (i - 4,)]- 1 2 ? h<* -* > i Ii-IWl 

1=1 m=l ^ i=l 

i^q i^q 


K 

+ n (i - Xp.) 

i=i 

i^q 

This formula is obtained by considering J (X) to be a function of the 
variable x only. Then J(X) is a parabola in x and the formula 

pq ~ pq 

gives the minimum of this parabola. The computation takes a time roughly 

2 

proportional to N to process a set of N points. 

An inspection of the graphs shows that in two through seven dimen- 
sions the sequences based on the various radical inverse functions, 

Sequences 1, 2, 3, 4, are better than those of the pseudo-random number 
generator or those based on the square roots of primes. In eight 
dimensions the two radical inverse sequences with equally spaced first 
dimension do better than the others but the two with radical inverse 
functions only do not do so well, at least to 1000 points. In nine 
dimensions the radical inverse sequences do not do so well as the other 
three sequences up to about 1500 points. However as the number of points 
is increased, these sequences perform better than either the pseudo- 
random generator or those based on roots of primes. 

Behavior of this sort is not unexpected. The radical inverse sequences 
in 9 dimensions use <f>^g (m) and ^23^ which are strongly correlated until 
m becomes large. As one goes to higher dimensions this behavior is in- 
tensified such that the radical inverse sequences will start decreasing 
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slowly but not level out so fast as the other sequences do. Figure 11 
shows that all these sequences have much "fine structure" so that detailed 
theoretical results will be very difficult to obtain. The monotone de- 
crease with N of sequences 2 and 4 is surprising in view of the erratic 
behavior of the other sequences. Although the "good lattice" points of 
Zaremba were constructed for integration of periodic functions, they too 
have low discrepancies. 

The sets of points, + , which were computed from Sequence 4 are the 
best so far obtained for N larger than about 100. Direct minimization 
of J(X) is a difficult and costly procedure. Without more theoretical 
knowledge about the behavior of J(X) as a fucntion of the coordinate 
values of X , it seems that the direct approach will not yield signifi- 
cant results on a large scale, but it can be used to refine any set of 
point to give a slightly better one. 

In comparing one sequence to another, it is not so much the vertical 
difference in T(X) at a given N that is important $ but the number of 
points required to give a given T(X) . For example in 3 dimensions, 
to achieve the same T(X) as Sequence 4 has at 200 points, Sequence 2 
needs 237 points, but to equal Sequence 4 at 800 points. Sequence 2 
requires 988 points. The difference in T(X) is less around 700 points 
than around 200 points, but it takes more points Sequence 2 to make it 
up. 

All the sequences studied here have as good or better behavior as a 
theoretical random sequence. Perhaps the low discrepancy of the pseudo- 
random number generator in several dimensions explains why computations 



based on such a generator yield good results even though there may be no 
justification for the probabilistic bounds usually employed. 
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FIGURE 11. T(X) vs. N from 150 to 190 points in 4 dimensions. 
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